Reference-guided Assembly of Metagenomic Sequences

نویسندگان

  • Victoria Cepeda
  • Bo Liu
  • Mathieu Almeida
  • Christopher M. Hill
  • Mihai Pop
چکیده

Metagenomic studies have primarily relied on de novo approaches for reconstructing genes and genomes from microbial mixtures. While database driven approaches have been employed in certain analyses, they have not been used in the assembly of metagenomic data. This is in part due to the small size and biased coverage of public genome databases, but also due to the inherent computational cost of mapping tens of millions of reads to thousands of full genome sequences. Here we describe the first effective approach for reference-guided metagenomic assembly that can complement and improve upon de novo metagenomic assembly methods. Combined with de novo assembly approaches, we show that MetaCompass is able to generate significantly better results than can be obtained by either comparative or de novo assembly independently. Using this approach we report improved assemblies for 688 metagenomic samples from the Human Microbiome Project.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

GRASP: Guided Reference-based Assembly of Short Peptides

Protein sequences predicted from metagenomic datasets are annotated by identifying their homologs via sequence comparisons with reference or curated proteins. However, a majority of metagenomic protein sequences are partial-length, arising as a result of identifying genes on sequencing reads or on assembled nucleotide contigs, which themselves are often very fragmented. The fragmented nature of...

متن کامل

A Graph-Centric Approach for Metagenome-Guided Peptide and Protein Identification in Metaproteomics

Metaproteomic studies adopt the common bottom-up proteomics approach to investigate the protein composition and the dynamics of protein expression in microbial communities. When matched metagenomic and/or metatranscriptomic data of the microbial communities are available, metaproteomic data analyses often employ a metagenome-guided approach, in which complete or fragmental protein-coding genes ...

متن کامل

MetaCRAST: reference-guided extraction of CRISPR spacers from unassembled metagenomes

Clustered regularly interspaced short palindromic repeat (CRISPR) systems are the adaptive immune systems of bacteria and archaea against viral infection. While CRISPRs have been exploited as a tool for genetic engineering, their spacer sequences can also provide valuable insights into microbial ecology by linking environmental viruses to their microbial hosts. Despite this importance, metageno...

متن کامل

Clustering of Short Read Sequences for de novo Transcriptome Assembly

Given the importance of transcriptome analysis in various biological studies and considering thevast amount of whole transcriptome sequencing data, it seems necessary to develop analgorithm to assemble transcriptome data. In this study we propose an algorithm fortranscriptome assembly in the absence of a reference genome. First, the contiguous sequencesare generated using de Bruijn graph with d...

متن کامل

Reference-independent comparative metagenomics using cross-assembly: crAss

MOTIVATION Metagenomes are often characterized by high levels of unknown sequences. Reads derived from known microorganisms can easily be identified and analyzed using fast homology search algorithms and a suitable reference database, but the unknown sequences are often ignored in further analyses, biasing conclusions. Nevertheless, it is possible to use more data in a comparative metagenomic a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016